CHAPTER 23 Survival Regression 335
per day has that 1.05 multiplication applied 20 times, which is like multiplying by
1 0520
.
, which equals 2.65. One pack contains 20 cigarettes, so if you change the
units in which you record smoking levels from cigarettes per day to packs per day,
you would use units that are 20 times larger. In that case, the corresponding
regression coefficient is 20 times larger, and the HR is raised to the 20th power
(2.65 instead of 1.05 in this example).
And a two-pack-per-day smoker’s hazard increases by a factor of 2.65 over a
one-pack-per-day smoker. This translates to a 2 652
.
increase (approximately
sevenfold) in the chances of dying at any instant for the smoker compared to a
nonsmoker.
Executing a Survival Regression
As with all statistical methods dealing with time-to-event data, your dependent
variable is actually a pair of variables:»
» Event status: The event status variable is coded this way:
• Equal to 1 if the event was known to occur during the observation period
(uncensored)
• Equal to 0 if the event didn’t occur during the observation period (censored)»
» Time-to-event: In participants who experienced the event during the
observation period, this is the time from the start of observation to the
occurrence of the event. In participants who did not experience the event
during the observation period, this is the time from the start of observation to
the last time the participant was observed. We describe time-to-event data in
more detail in Chapter 21.
And as with all regression methods, you designate one or more variables as the
predictors. The rules for representing the predictor variables are the same as
described in Chapter 18:»
» For continuous numerical variables, choose units of a convenient magnitude.»
» For categorical predictors, carefully consider how you recode the data,
especially in terms of selecting a reference group. Consider a five-level age
group variable. Would you want to model it as an ordinal categorical variable,
assuming a linear relationship with the outcome? Or would you prefer using
indicator variables, allowing each level to have its own slope relative to the
reference level? Flip to Chapter 8 for more on recoding categorical variables.